Heuristic Search Value Iteration for POMDPs

نویسندگان

  • Trey Smith
  • Reid G. Simmons
چکیده

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI’s soundness and convergence have been proven. On some benchmark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving POMDPs by Searching in Policy Space

Most algorithms for solving POMDPs itera­ tively improve a value function that implic­ itly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that repre­ sents a policy explicitly as a finite-state con­ troller and iteratively improves the controller by search in policy space. Two related al­ gorithms illustrate this approach. ...

متن کامل

Markov decision processes with observation costs

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process in which observation of the process state can be imperfect and/or costly. Although it provides an elegant model for control and planning problems that include information-gathering actions, the best current algorithms for POMDPs are computationally infeasible for all but small problems. One a...

متن کامل

Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the ...

متن کامل

Scaling POMDPs for dialog management with composite summary point-based value iteration (CSPBVI)

Although partially observable Markov decision processes (POMDPs) have shown great promise as a framework for dialog management in spoken dialog systems, important scalability issues remain. This paper tackles the problem of scaling slot-filling POMDP-based dialog managers to many slots with a novel technique called composite point-based value iteration (CSPBVI). CSPBVI creates a “local” POMDP p...

متن کامل

Symbolic Heuristic Search Value Iteration for Factored POMDPs

We propose Symbolic heuristic search value iteration (Symbolic HSVI) algorithm, which extends the heuristic search value iteration (HSVI) algorithm in order to handle factored partially observable Markov decision processes (factored POMDPs). The idea is to use algebraic decision diagrams (ADDs) for compactly representing the problem itself and all the relevant intermediate computation results i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004